Skip to content

Conversation

@brouberol
Copy link
Contributor

Description

We allow the opensearch-operator to watch multiple namespaces.

We keep the original -watch-namespace flag, to ensure backwards compatibility. We simply split the value over any comma, and populate the cache for each namespace in the csv.

Note: Because the watchNamespace variable was being tested for emptiness before flag.Parse() was being called, it was always empty, causing the operator to always watch all namespaces in the cluster. This is no longer the case.

I have added documentation in the user guide as well as in the chart values.

Because this change occurs in main.go, for which we don't have unit tests, I'll enclose my manual test notes.

Testing

We first rebuild the operator binary.

~/code/opensearch-k8s-operator/opensearch-operator watch-multiple-ns ?1 ❯ make build
test -s /Users/brouberol/code/opensearch-k8s-operator/opensearch-operator/bin/controller-gen || GOBIN=/Users/brouberol/code/opensearch-k8s-operator/opensearch-operator/bin go install sigs.k8s.io/controller-tools/cmd/[email protected]
/Users/brouberol/code/opensearch-k8s-operator/opensearch-operator/bin/controller-gen object:headerFile="hack/boilerplate.go.txt" paths="./..."
go fmt ./...
go vet ./...
go build -o bin/manager main.go

We ensure that the new behavior is now available.

~/code/opensearch-k8s-operator/opensearch-operator watch-multiple-ns ?1 ❯ ./bin/manager --help 2>&1 | grep -A 1 watch-namespace
  -watch-namespace string
    	The comma-separated list of namespaces that the controller manager is restricted to watch. If not set, default is to watch all namespaces.

We run the operator alongside a local minikube.

~/code/opensearch-k8s-operator/opensearch-operator watch-multiple-ns ?1 ❯ ./bin/manager -watch-namespace ns1,ns2 
{"level":"info","ts":"2025-09-17T16:34:39.456+0200","logger":"setup","msg":"Starting manager"}
{"level":"info","ts":"2025-09-17T16:34:39.457+0200","msg":"starting server","name":"health probe","addr":"[::]:8081"}
{"level":"info","ts":"2025-09-17T16:34:39.457+0200","logger":"controller-runtime.metrics","msg":"Starting metrics server"}
...

We define a namespace-less cluster resource

~/c/opensearch-k8s-operator/opensearch-operator watch-multiple-ns ?1 ❯ cat cluster.yaml
apiVersion: opensearch.opster.io/v1
kind: OpenSearchCluster
metadata:
  name: opensearch-cluster
spec:
  general:
    serviceName: opensearch-cluster
    version: '3'
  dashboards:
    enable: true
    version: '3'
    replicas: 1
    resources:
      requests:
        memory: "512Mi"
        cpu: "200m"
      limits:
        memory: "512Mi"
        cpu: "200m"
  nodePools:
    - component: nodes
      replicas: 3
      diskSize: "5Gi"
      nodeSelector:
      resources:
        requests:
          memory: "2Gi"
          cpu: "500m"
        limits:
          memory: "2Gi"
          cpu: "500m"
      roles:
        - "cluster_manager"
        - "data"

We create 3 namespaces

~/c/opensearch-k8s-operator/opensearch-operator watch-multiple-ns ?1 ❯ kubectl create namespace ns1
namespace/ns1 created
~/c/opensearch-k8s-operator/opensearch-operator watch-multiple-ns ?1 ❯ kubectl create namespace ns2
namespace/ns2 created
~/c/opensearch-k8s-operator/opensearch-operator watch-multiple-ns ?1 ❯ kubectl create namespace ns3
namespace/ns3 created

We now create an opensearch cluster in ns1

~/c/opensearch-k8s-operator/opensearch-operator watch-multiple-ns ?1 ❯ kubectl create -n ns1 -f cluster.yaml
opensearchcluster.opensearch.opster.io/opensearch-cluster created

We start seeing activity in the operator logs

{"level":"info","ts":"2025-09-17T16:38:19.560+0200","msg":"Reconciling OpenSearchCluster","controller":"opensearchcluster","controllerGroup":"opensearch.opster.io","controllerKind":"OpenSearchCluster","OpenSearchCluster":{"name":"opensearch-cluster","namespace":"ns1"},"namespace":"ns1","name":"opensearch-cluster","reconcileID":"4e9e94b2-8f25-4832-92bf-3a8e23349e3b","cluster":{"name":"opensearch-cluster","namespace":"ns1"}}
{"level":"info","ts":"2025-09-17T16:38:19.566+0200","msg":"Start reconcile - Phase: PENDING","controller":"opensearchcluster","controllerGroup":"opensearch.opster.io","controllerKind":"OpenSearchCluster","OpenSearchCluster":{"name":"opensearch-cluster","namespace":"ns1"},"namespace":"ns1","name":"opensearch-cluster","reconcileID":"4e9e94b2-8f25-4832-92bf-3a8e23349e3b","cluster":{"name":"opensearch-cluster","namespace":"ns1"}}
...

We now create an opensearch cluster in ns2:

~/c/opensearch-k8s-operator/opensearch-operator watch-multiple-ns ?1 ❯ kubectl create -n ns2 -f cluster.yaml
opensearchcluster.opensearch.opster.io/opensearch-cluster created

We start seeing activity in the operator logs, this time related to the cluster in ns2

{"level":"info","ts":"2025-09-17T16:41:40.313+0200","msg":"Reconciling OpenSearchCluster","controller":"opensearchcluster","controllerGroup":"opensearch.opster.io","controllerKind":"OpenSearchCluster","OpenSearchCluster":{"name":"opensearch-cluster","namespace":"ns2"},"namespace":"ns2","name":"opensearch-cluster","reconcileID":"2b0e1d2a-9d9b-4cff-b52c-d3b32789b556","cluster":{"name":"opensearch-cluster","namespace":"ns2"}}
{"level":"info","ts":"2025-09-17T16:41:40.324+0200","msg":"Start reconcile - Phase: PENDING","controller":"opensearchcluster","controllerGroup":"opensearch.opster.io","controllerKind":"OpenSearchCluster","OpenSearchCluster":{"name":"opensearch-cluster","namespace":"ns2"},"namespace":"ns2","name":"opensearch-cluster","reconcileID":"2b0e1d2a-9d9b-4cff-b52c-d3b32789b556","cluster":{"name":"opensearch-cluster","namespace":"ns2"}}
...

We finally create a cluster in ns3:

~/c/opensearch-k8s-operator/opensearch-operator watch-multiple-ns ?1 ❯ kubectl create -n ns3 -f cluster.yaml
opensearchcluster.opensearch.opster.io/opensearch-cluster created

This time, no log related to the cluster in ns3 is observed in the controller logs.

Chart changes

I render the chart using the default values. The output does not contain the -watch-namespace flag.

~/c/opensearch-k8s-operator/c/opensearch-operator watch-multiple-ns ?1 ❯ helm template  . | grep watch-namespace
~/c/opensearch-k8s-operator/c/opensearch-operator watch-multiple-ns ?1 ❯

I then inject either a single or multiple namespaces to watch, either in a csv or in a list, to ensure that the rendering is correct:

~/c/opensearch-k8s-operator/c/opensearch-operator watch-multiple-ns ?1 ❯ helm template  . --set-json='manager.watchNamespace="ns1"' | grep watch-namespace
        - --watch-namespace=ns1
~/c/opensearch-k8s-operator/c/opensearch-operator watch-multiple-ns ?1 ❯ helm template  . --set-json='manager.watchNamespace="ns1,ns2"' | grep watch-namespace
        - --watch-namespace=ns1,ns2
~/c/opensearch-k8s-operator/c/opensearch-operator watch-multiple-ns ?1 ❯ helm template  . --set-json='manager.watchNamespace=["ns1"]' | grep watch-namespace
        - --watch-namespace=ns1
~/c/opensearch-k8s-operator/c/opensearch-operator watch-multiple-ns ?1 ❯ helm template  . --set-json='manager.watchNamespace=["ns1", "ns2"]' | grep watch-namespace
        - --watch-namespace=ns1,ns2

Issues Resolved

Closes #374

Check List

  • Commits are signed per the DCO using --signoff
  • Unittest added for the new/changed functionality and all unit tests are successful
  • Customer-visible features documented
  • No linter warnings (make lint)

If CRDs are changed:

  • CRD YAMLs updated (make manifests) and also copied into the helm chart
  • Changes to CRDs documented

Please refer to the PR guidelines before submitting this pull request.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@inflatador
Copy link

@prudhvigodithi @swoehrl-mw Greetings! I'm an SRE with the Wikimedia Foundation and I work with @brouberol .

We're rolling out a new OpenSearch environment on K8s in the next month or so and I was wondering if y'all had the cycles to review this change? There's more context in our task tracker if y'all are interested.

Thanks for taking a look and feel free to ping here or in OpenSearch Slack if you have any questions or comments.

@prudhvigodithi
Copy link
Member

Adding @patelsmit32123 @synhershko to please take a look and add your thoughts.

@synhershko
Copy link
Collaborator

We're rolling out a new OpenSearch environment on K8s in the next month or so and I was wondering if y'all had the cycles to review this change? There's more context in our task tracker if y'all are interested.

FWIW we are planning a massive release of a 3.0 version of this operator which will be significantly better and safer to use in production.

@synhershko
Copy link
Collaborator

What is the use-case for doing what you are doing here? I might be missing something

cc @josedev-union

@brouberol
Copy link
Contributor Author

brouberol commented Oct 23, 2025

What is the use-case for doing what you are doing here?

The use case is mostly deployment convenience (only having to deploy a single operator cluster-wide), as well as aligning with common operator behavior within the Kubernetes ecosystem.

For example, having a single operator being able to watch multiple operators is supported by:

Note: We're not actively running all of these operators (only a subset), but I sampled actively maintained operator codebases and documentation to showcase that this is a common behavior.

My point (and IMHO the general sentiment over at #374) is that this behavior is expected by operator users and deployers, as it's become quite standard.

It is something that is natively supported by the operator SDK, and does not take anything away from the current operator behavior, as it only adds the ability to have the operator manage one to many clusters, instead of a single one at the moment.

I hope it clears things up.

Note:: I just realized that for this patch to be complete, it's lacking iterating over watched namespaces to setup the appropriate roles and role bindings in each of them. I'm happy to send that work over if the feature request intention is approved.

@synhershko
Copy link
Collaborator

Got it. Happy to merge this once conflicts are resolved and CI is green.

Copy link
Contributor

@josedev-union josedev-union left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • First, please resolve conflicts.
  • Second,
    ## If this is set to true, RoleBindings will be used instead of ClusterRoleBindings, inorder to restrict ClusterRoles
    ## to the namespace where the operator and OpenSearch cluster are in. In that case, specify the namespace where they
    ## are in in manager.watchNamespace field.
    ## If false, ClusterRoleBindings will be used
    useRoleBindings: false

    This is not directly linked to your changes but good to have in the same scope.
    useRoleBindings should be enabled only when watchNamespace is equal to the release namespace. If not, then we need to use clusterRoleBindings.
    Please update the helm doc of useRoleBindings part properly and fix typo in the current one.

Copy link
Contributor

@josedev-union josedev-union left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a critical issue when watchNamespace is not set (empty string).

We keep the original `-watch-namespace` flag, to ensure backwards compatibility.
We simply split the value over any comma, and populate the cache for
each namespace in the csv.

Note: Because the `watchNamespace` variable was being tested for
emptiness _before_ `flag.Parse()` was being called, it was always empty,
causing the operator to _always_ watch all namespaces in the cluster.
This is no longer the case.

Fixes opensearch-project#374

Signed-off-by: Balthazar Rouberol <[email protected]>
@brouberol
Copy link
Contributor Author

brouberol commented Oct 29, 2025

@josedev-union I'm confused by the useRoleBindings required change.

The chart defines 2 ClusterRoles:

  • {{ include "opensearch-operator.fullname" . }}-{{ .Release.Namespace }}-proxy-role
  • {{ include "opensearch-operator.fullname" . }}-{{ .Release.Namespace }}-manager-role

If we define cluster.watchNamespace to, say, [ns1, ns2], we probably want to use RoleBindings to bing each of these ClusterRoles, scoped in each of these 2 namespaces, don't we? The alternative would be to use a ClusterRoleBinding which would allow the opensearch operator access to, amongst other things, access to Secret resources in all namespaces.

Cf https://kubernetes.io/docs/reference/access-authn-authz/rbac/#rolebinding-and-clusterrolebinding

A RoleBinding may reference any Role in the same namespace. Alternatively, a RoleBinding can reference a ClusterRole and bind that ClusterRole to the namespace of the RoleBinding. If you want to bind a ClusterRole to all the namespaces in your cluster, you use a ClusterRoleBinding.

(emphasis mine)

The way I see if, the logic should be to use RoleBindings as soon as manager.watchNamespace is non-empty. If it is, then the controller would watch all namespaces, and thus would require a ClusterRoleBinding.

WDYT?

@josedev-union
Copy link
Contributor

  • This is not directly linked to your changes but good to have in the same scope.
    useRoleBindings should be enabled only when watchNamespace is equal to the release namespace. If not, then we need to use clusterRoleBindings.
    Please update the helm doc of useRoleBindings part properly and fix typo in the current one.

yes, we need to use clusterRoleBinding in such cases.
What i requested to change is to update helm docs in values file to clearly explain that.
In the current helm docs, it is incorrect. We can just use RoleBinding only when the watchNamespace is a single one and also it is equal to the release namespace. But now it is written that we can use RoleBinding just when watchNamespace is specified.

@brouberol
Copy link
Contributor Author

brouberol commented Oct 29, 2025

So, this is where I'm not sure I agree (or maybe we agree and I'm just misunderstanding).

If you watch specific namespaces, by having a non-empty cluster.watchNamespace, then the chart should use a RoleBinding to bind the operator ClusterRole, in each of the watched namespaces.

The RoleBinding / ClusterRoleBinding decision should be:

flowchart TD
    B{Is it manager.watchNamespace empty?}
    B -- Yes --> C[Use a ClusterRoleBinding to give permissions to the operator on *all* namespaces]
    B -- No ----> D[Use a RoleBinding on the operator ClusterRoleBinding in each of the watched namespaces]
Loading

Happy to hear your thoughts

@josedev-union
Copy link
Contributor

@brouberol nah, it is much more than my original thought. :)
#1101 (review)
I just recommend you to update comments in the helm chart values file only like this.

## If this is set to true, RoleBindings will be used instead of ClusterRoleBindings, inorder to restrict ClusterRoles 
 ## to the namespace where the operator and OpenSearch cluster are in.
 ## You need to set the release namespace as manager.watchNamespace
 useRoleBindings: false 

@brouberol
Copy link
Contributor Author

brouberol commented Oct 29, 2025

Ok, I think I pinpointed where our lack of understanding is coming from. Let me know if I get this right.

The opensearch operator is usually deployed in the same namespace than the opensearch cluster. When that is the case, we can use a RoleBinding because the Role will be part of the same namespace as the cluster.
When that is not the case, the current chart design is to use ClusterRole for the operator and a ClusterRoleBinding to grant permission to the operator to all namespaces.

In that light, I understand your comment: having useRoleBindings: true would only work if {{ .Release.Namespace }} == {{ $watchedNamespace }}. If this is how the chart wants things to work, then sure, I'll update the comment.

What I was pushing towards though, is considering that the operator can be run from its own namespace (as it is common in the ecosystem to run operator wither from kube-system or dedicated namespaces, such as opensearch-operator in our case). The operator would define its permissions via a ClusterRole, and the operator permissions would be bound to the watched namespaces only via a ClusterRoleBinding in each of these namespaces (in the following diagram, [ns1, ns2].

classDiagram

    ClusterRole <|-- RoleBindingNS1
    ClusterRole <|-- RoleBindingNS2
    
    class ClusterRole {
        name: opensearch-operator-manager-role
        permissions: [...]
    }
    class RoleBindingNS1 {
        namespace: ns1
        ---
        roleRef.apiGroup: rbac.authorization.k8s.io
        roleRef.kind: ClusterRole
        roleRef.name: opensearch-operator-manager-role
        subjects[0].kind: ServiceAccount
        subjects[0].name: opensearch-operator
        subjects[0].namespace: opensearch-operator
    }
    class RoleBindingNS2 {
        namespace: ns2
        ---
        roleRef.apiGroup: rbac.authorization.k8s.io
        roleRef.kind: ClusterRole
        roleRef.name: opensearch-operator-manager-role
        subjects[0].kind: ServiceAccount
        subjects[0].name: opensearch-operator
        subjects[0].namespace: opensearch-operator    }
Loading

The main reason to do this would be to avoid granting the opensearch-operator to read/write/delete/update Secret resources in all namespaces, even those it is not watching, which is as large of a security risk as you can get.

I'm happy to get told "let's punt this to another PR" and I'll just update the comment. However, let's be aware than it its current state, the operator permissions are wide open.

Linking back to #374 original message, we see

For security reasons we are not able to use the clusterrolebinding and have to use namespaced rolebindings instead. This means we can only use the watchNamespace mode of deployment which obviously means that we would have to install multiple operators for multiple clusters in separate namespaces.

This is the security issue they're mentioning.

@josedev-union
Copy link
Contributor

Ok, I think I pinpointed where our lack of understanding is coming from. Let me know if I get this right.

The opensearch operator is usually deployed in the same namespace than the opensearch cluster. When that is the case, we can use a RoleBinding because the Role will be part of the same namespace as the cluster. When that is not the case, the current chart design is to use ClusterRole for the operator and a ClusterRoleBinding to grant permission to the operator to all namespaces.

In that light, I understand your comment: having useRoleBindings: true would only work if {{ .Release.Namespace }} == {{ $watchedNamespace }}. If this is how the chart wants things to work, then sure, I'll update the comment.

What I was pushing towards though, is considering that the operator can be run from its own namespace (as it is common in the ecosystem to run operator wither from kube-system or dedicated namespaces, such as opensearch-operator in our case). The operator would define its permissions via a ClusterRole, and the operator permissions would be bound to the watched namespaces only via a ClusterRoleBinding in each of these namespaces (in the following diagram, [ns1, ns2].

classDiagram

    ClusterRole <|-- RoleBindingNS1
    ClusterRole <|-- RoleBindingNS2
    
    class ClusterRole {
        name: opensearch-operator-manager-role
        permissions: [...]
    }
    class RoleBindingNS1 {
        namespace: ns1
        ---
        roleRef.apiGroup: rbac.authorization.k8s.io
        roleRef.kind: ClusterRole
        roleRef.name: opensearch-operator-manager-role
        subjects[0].kind: ServiceAccount
        subjects[0].name: opensearch-operator
        subjects[0].namespace: opensearch-operator
    }
    class RoleBindingNS2 {
        namespace: ns2
        ---
        roleRef.apiGroup: rbac.authorization.k8s.io
        roleRef.kind: ClusterRole
        roleRef.name: opensearch-operator-manager-role
        subjects[0].kind: ServiceAccount
        subjects[0].name: opensearch-operator
        subjects[0].namespace: opensearch-operator    }
Loading

The main reason to do this would be to avoid granting the opensearch-operator to read/write/delete/update Secret resources in all namespaces, even those it is not watching, which is as large of a security risk as you can get.

I'm happy to get told "let's punt this to another PR" and I'll just update the comment. However, let's be aware than it its current state, the operator permissions are wide open.

Linking back to #374 original message, we see

For security reasons we are not able to use the clusterrolebinding and have to use namespaced rolebindings instead. This means we can only use the watchNamespace mode of deployment which obviously means that we would have to install multiple operators for multiple clusters in separate namespaces.

This is the security issue they're mentioning.

I prefer to open a new issue for this.
Normally, operators have a boolean flag watchGlobal and they just switch between ClusterRoleBinding and RoleBinding. In this case, the current implementation is ok. But now we specify the list of namespaces, so need to revist this topic to follow PoLP.

@synhershko synhershko merged commit 408c100 into opensearch-project:main Nov 3, 2025
11 checks passed
@github-project-automation github-project-automation bot moved this from 👀 In Review to ✅ Done in Engineering Effectiveness Board Nov 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: ✅ Done

Development

Successfully merging this pull request may close these issues.

Make watchNamespace a list

5 participants